Semi-classical approach to sequential recombination algorithms for jet clustering 
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We derive a new sequential recombination algorithm for reconstructing jets of particles in high- 
energy collision events from a simple, semi-classical model of successive uniform massless emissions. 
The model results in a different distance measure used to determine the sequence of clustering steps, 
and effectively subtracts background as it reconstructs the jet. We examine the new algorithm's 
behavior in light of existing algorithms, and we find that in Monte Carlo comparisons, the new 
algorithm's robustness against collision backgrounds is comparable to that of other jet algorithms 
when the latter have been augmented by further background subtraction techniques. 
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Collimated jets of particles are a distinctive feature of 
high energy elementary particle collisions and are often 
taken to indicate the presence of ejected quarks or glu- 
ons, particles normally shrouded by the effects of quan- 
tum chromodynamics (QCD). Jet reconstruction there- 
fore plays a prominent role in event analysis, and as the 
search for new physics breaches new thresholds in en- 
ergy and jet multiplicity, understanding jet reconstruc- 
tion itself has taken on new importance. This impor- 
tance is especially true in the study of highly relativistic 
("boosted" ) objects, in which evidence of heavy or exotic 
particle production and decay can be discerned in a jet's 
substructure. Experimental results on jet substructure 
have been pubhshed by the CDF Q, ATLAS 01, and 
CMS (31] experiments. 

A standard class of methods for jet reconstruction in 
hadron collider experiments is the sequential recombi- 
nation algorithm. Different varieties of this algorithm 
usually are rooted in physical or geometric considera- 
tions, such as QCD splitting functions for the kx algo- 
rithm y, y , angular ordering for Cambridge- Aachen [6| , 
and collimated jet cores for anti-fcr [2|- It is also pos- 
sible to look at jets from the perspective of the rela- 
tivistic boosts themselves. This perspective has been 
used to motivate, for instance, so-called "variable- i?" jet 
algorithms [8|, which focus on resonance decays within 
the jet. Successive gluon emissions in the jet, however, 
broaden the jet even further than would be expected from 
resonance decays alone. 

In this article, we consider a simplified, semi-classical 
model based on relativistic boosts of these successive 
emissions. The model is used to derive a new sequen- 
tial recombination algorithm which simultaneously re- 
moves background radiation, including initial state ra- 
diation as well as that originating from the unassoci- 
ated collisions ( "pileup" ) which are an important feature 
of modern high-luminosity colliders such as the Large 
Hadron Collider. We test the new algorithm on simu- 
lated high-energy W bosons with and without the pres- 
ence of pileup, and compare the results with those of 
other clustering algorithms. We will find that the new 
algorithm's performance is similar to that of the other al- 



gorithms after further background subtraction ("groom- 
ing" ) techniques have been applied to the latter. The new 
algorithm can be a useful addition to the experimenter's 
toolkit for resolving the structure of highly energetic col- 
lision products. 

In the semi-classical model, we conceive of a parton- 
initiated jet as a parent particle with some effective mass 
and defined energy and direction in the laboratory frame. 
This parent, and its children, undergo a series of rela- 
tively soft, massless emissions which are uniform in their 
own rest frames. Each emission is then boosted into the 
laboratory frame with the direct ancestor's remaining en- 
ergy. The angular probability density in the laboratory 
frame is 
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where 9 is the angle between the emission and the direct 
ancestor's direction, and 7 and /3 are the boost factors 
into the laboratory frame. The probability of finding an 
emission in solid angle dU, is then 
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Non-classical effects, such as those from spin and color, 
are expected to scatter a small amount of non-spherical 
radiation [9|, |lO| , and are neglected in this model. Gluon 
jets are also broader than those of quarks; as a result, 
in common with other generic jet algorithms, different 
initial partons may be reconstructed with different effi- 
ciencies. 

Jet clustering can be thought of as choosing the most 
likely sequence of 1 — > 2 splittings to produce the ob- 
served jet. Maximizing the above probability at each step 
is the same as finding the smallest distance Wij between 
pairs of pre-existing clusters. 
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where i and j are the clusters, Oij the angle between 
them, and Ei + Ej the energy sum which takes the place 



of 7. We have let /3 w 1 for simplicity. We make the 
usual replacements, for the hadron collider environment, 
of energy E with transverse energy Et, and 9ij with 
ARij — y^{Ayij)'^ + (A(/)y)2, where Ay^ is the rapidity 
difference and Acjjij is the difference in azimuthal angle. 
Expanding the sines and cosines then gives us the new 
distance measure 

3 

(4) 



dij — -{Eti + Etj) 



fAR 



R 



where we have introduced the jet scale parameter R, 
analogous to that of other inclusive jet algorithms. The 
effect of the coefRcient 1/4 is that when we define the 
cluster-beam distance measure 
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the result is that diB < dij whenever two jets with the 
same Et are separated by ARij > R. If En < Etj , we 
also have diB < dij whenever ARij > R, and therefore R 
is, as in other algorithms, the maximum ARij between 
clusters that can be merged. 

We follow the usual steps for a sequential recombina- 
tion algorithm: the distances dij are calculated for each 
pair of clusters i and j, and diB for each cluster i. If the 
smallest distance is a dij, the pair is merged by adding 
their 4-momenta. If the smallest distance is a diB, the 
cluster is deemed an independent jet and removed from 
further consideration. These steps are repeated until all 
clusters have been deemed jets. This "semi-classical" 
(SC) algorithm is collinear and infrared-safe by construc- 
tion. 

It is useful to compare the new algorithm with the in- 
clusive fcr algorithm, which uses the pair distance mea- 
sure 
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One obvious difference is the increased ARij exponent. 
The kT algorithm, in common with most other algo- 
rithms in general use, incorporate the factor Ai?? or its 
close relative (1 — cos 9 ij). Overall, the effect of the dif- 
ferent exponent is not dramatic: larger exponents have 
been tested, and, for the most part, merely allow clusters 
to merge with larger ARij, increasing the reconstructed 
jet size. For the rest of this article, we keep the expo- 
nent as 3, as that directly motivated by the semi-classical 
model. 

The different energy factor, on the other hand, changes 
the order in which clusters are merged and set aside as 
jets. The /ct algorithm starts by merging soft clusters, 
as one would expect for an algorithm which attempts 
to reverse the splitting history, but avoids the perceived 
problem of the JADE algorithm IjJ, |l2| , with distance 
measure dij = EiEj {1— cos 9ij ) , which can allow large an- 
gle clusterings of very soft pairs. The semi-classical algo- 
rithm also starts by merging soft pairs, though the raised 
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FIG. 1. Comparison of the semi-classical algorithm with 
pruning. The diagonally hashed region indicates mergings 
rejected by the semi-classical algorithm, while the vertically 
hashed region is for pruning. The pruning parameters are 
taken from [15J |. 



ARij exponent clusters some high-ET clusters sooner if 
they are sufficiently close. Large angle clusterings are 
suppressed by the R scale and beam clustering. However, 
the most significant difference in behavior between the 
semi-classical and fcy algorithms is that for sufficiently 
large ARij (though still with ARij < R), the compar- 
ison with diB prevents a number of soft clusters from 
merging with high-_BT clusters when 
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As a result, while R defines the maximum extent of a jet 
in the semi-classical algorithm, the actual jets are likely 
to be narrower, with higher Et associated with narrower 
jets. This behavior is similar to that of jet "pruning" , 
which vetoes mergings which satisfy the two conditions 
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and discards the softer of the two clusters [13|, [1^ . Fig- 
ure [1] compares the two methods, with pruning removing 
the rectangular region in the {ARij, Zij) plane, while the 
semi-classical algorithm additionally removes some soft 
clusters at small angles as well as harder clusters at large 
ARij. As these clusters become stand-alone jets, it is 
possible for final jets to be separated by ARij < R. This 
behavior follows from the algorithm's underlying model 
of a single light parton jet, in which a large-angle split 
into two high-energy subjets is unlikely. Instead, most of 
the energy is assumed to be highly collimated in a single, 
narrow cluster. 

Initial studies of the semi-classical algorithm with 
boosted objects have been performed using the Pythia 
(version 8.150) Monte Carlo generator 16|, [l7|. Single 



R=1, pileup|j=0 

"■" semi-classical 

° "t 

^ Cambridge-Aachen 

« Antl-k^ 





R=1, pileupn=25 

-■- semi-classical 
o k, 

A Cambridge-Aaclien 
Anti-k, 



...1 )i'li'»i'W'»i'>»j ? ujj 



"<¥.'*-! 



200 250 

m , (GeV/c^) 



FIG. 2. Jet mass distibutions for high-pT jets in the same 
hemisphere as the generated W boson, with no pileup and 
R = 1 for the semi-classical, kr, Cambridge- Aachen, and anti- 
kr algorithms. 



FIG. 3. Jet mass distibutions for high-pT jets in the same 
hemisphere as the generated W boson, with an average of 25 
pileup events overlaid and R = 1 for the semi-classical, kr, 
Cambridge-Aachen, and anti-ZcT algorithms. 



hadronically decaying W^-Fparton events were generated 
with W pT > 500 GeV/c at ^s = 8 TeV. Non-neutrino 
particles were then collected into 0.1 x 0.1 77 — cells 
out to I77I < 5, where r] = — ln[tan(6'/2)] is pseudorapid- 
ity. Up to an average of 25 QCD minimum bias events, 
using Tune 4Cx Il8| and the CTEQ6L1 parton distri- 
bution functions [l9| , were overlaid as "pileup" , assum- 
ing the same interaction vertex. Only cells with energy 
greater than 0.5 GeV were considered for jet clustering. 
Jets were then found using the fc^, Cambridge- Aachen, 
and anti-fcy algorithms implemented in Fastjet version 
3.0.3 [20|, and the semi-classical algorithm implemented 
as a Fastjet plugin [2l|. Jet masses were calculated by 
summing the 4-momenta of the cells, assuming zero mass 
for each cell. 

Figure [H shows the jet mass distribution for jets with 
Pt > 400 GeV/c in the same hemisphere as the generated 
W for the different ungroomed jet algorithms with R = 1. 
Even with no pileup, the effect of additional radiation can 
be seen in the other algorithms, while the semi-classical 
peak is narrowest and lies closest, at 80.9 ± 0.1 GeV/c^, 
to the generated W mass of 80.385 GeV/c^. The low 
and zero-mass bumps are the result of the semi-classical 
algorithm "pruning" close but energetically unbalanced 
W daughters, as noted above; combining the jet with an- 
other nearby jet recovers the W mass. When the pileup 
level increases to an average of 25, as shown in Figure |31 
the semi-classical peak shifts roughly 4 GeV/c^ higher, 
but remains a recognizable, narrow peak, while the others 
are much broader due to incorporating pileup radiation. 

The effects of additional radiation usually are miti- 
gated by reducing the R parameter, and indeed one can 
see in Figure 3] that aX R — 0.4, all the peak masses 
cluster around 80 GeV/c^, rising rapidly for the other 
ungroomed algorithms. The semi-classical algorithm, on 
the other hand, starts low at i? = 0.4, where the two W 
daughters often are resolved into different jets, and levels 
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FIG. 4. Peak mass vs R for events with zero pileup. 



off above i? = 0.7. 

Boosted object analyses, however, typically use large 
R values between 1 and 1.5 in order to remain sensi- 
tive to a larger range of energies. In order to mitigate 
pileup effects in such large jets, the jet can undergo fur- 
ther "grooming" . It is therefore instructive to compare 
the new algorithm with grooming techniques, several of 
which, including pruning, are also shown in Figures H] 
and[5l It should be noted that grooming techniques usu- 
ally are tailored to particular environments, and rely on 
knowledge of the target final state such as one might use 
to design a search strategy based on individually resolved 
jets. The comparisons shown in this article are therefore 
indicative, leaving optimization for specific signals and 
backgrounds for those particular analyses. 

Pruning has already been described. We start with 
anti-^T jets with a given R, and use the parameters 
Zcut — 0.1 and Dcut — 0.2 [15[ to prune. We compare 
the resulting jets with those from the semi-classical algo- 
rithm by itself, with the same R. Not surprisingly, the 
two algorithms behave similarly in Figures U and [SJ even 
rising at a similar rate when the average pileup level is 25. 
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FIG. 5. Peak mass vs R for events with average 25 pileup. For 
most values of R for the kr, Cambridge- Aachen, and anti-fcr 
algorithms, the distributions are broad rather than peaked. 
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FIG. 6. Jet mass distibutions for high-pT semi-classical and 
pruned anti-fcr jets in the same hemisphere as the generated 
W boson, with zero and average 25 pileup. 



Jet mass distributions for R = 1.5 are shown in Figure [51 
The presence of pileup shifts the peaks of the distribu- 
tions upward, as expected. The semi-classical algorithm, 
however, leaves a larger high-mass tail, but also a smaller 
low-mass bump, suggesting that while it eliminates less 
pileup radiation, it retains both W daughters more often. 
It is also evident that the given pruning parameters are 
too aggressive for these particular conditions, resulting 
in a low peak mass. 

Next, we consider the grooming technique of trimming, 
which attempts to discern narrow, high-p^ subjets within 
the parent jet 22, l23[. For the comparison, we use the 
Cambridge- Aachen algorithm to recluster within the par- 
ent jet with a smaller radius parameter Rsub = 0.3, and 
discard the resulting subjets with pT < fsubPr, where 
fsub ~ 0.05 is a parameter and Pt is the transverse mo- 
mentum of the parent jet 15|. The jet mass is then 
calculated by summing the remaining high-pr subjets. 
Figure [5] shows trimming to be more stable under these 
pileup conditions than pruning or the ungroomed semi- 
classical algorithm. 



FIG. 7. Trimmed jet mass distibutions for high-pr jets in 
the same hemisphere as the generated W boson, with zero 
and 25 pileup. Reclustering of the parent anti-fcr jet has 
been performed using the Cambridge-Aachen algorithm with 
Rsub = 0.3, and the semi-classical algorithm with Raub = 0.4. 



The semi-classical algorithm can be used for recluster- 
ing, and indeed, reclustering is arguably a more natural 
context for the new algorithm than event-level cluster- 
ing, as its underlying model is that of a single-parton 
jet, rather than a complex object incorporating two or 
more energetic decay products. In effect, this method 
combines pruning and conventional trimming. Figure [7] 
shows the results of reclustering with the Cambridge- 
Aachen and semi-classical algorithms. With the latter, 
we use a slightly larger value of Rsub = 0.4 to com- 
pensate for the smaller semi-classical jets. Again, low- 
mass bumps are observed, where the other W daughter 
has been discarded by the trimming technique. As ex- 
pected, the mass distributions are very similar, and are 
also largely insensitive to both the parent jet's R param- 
eter and the pileup level. 

Figures [8] and [9] show the effect of increasing the pileup 
level on the different ungroomed and groomed algorithms 
for two large R values. The difference between un- 
groomed and groomed jets is more obvious here, with the 
mass peak rapidly rising and broadening at even mod- 
est levels of pileup for all the ungroomed algorithms ex- 
cept the semi-classical algorithm. The ungroomed semi- 
classical algorithm parallels pruning over this range of 
pileup level, while trimming, with either reclustering al- 
gorithm, is more stable than pruning for this boosted W 
final state. 

In this article, we have used a much simplified, semi- 
classical approach to motivate a new distance measure in 
a sequential recombination algorithm for jet clustering. 
The resulting algorithm effectively combines jet cluster- 
ing with pruning-like behavior in one step. Monte Carlo 
tests with PythiaS show the algorithm by itself perform- 
ing like an algorithm with jet grooming in terms of sta- 
bility with respect to the jet scale parameter R as well as 
to pileup. It can also be used to recluster narrow subjets 
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FIG. 8. Dependence of mass peak position on pileup for dif- 
ferent algorithms with R=\. The mass distributions at most 
pileup levels for the fcr, Cambridge- Aachen, and anti-fcr al- 
gorithms are very broad, with maxima above 100 GeV/c^. 



H 



R=1.5 

-■- semi-classical 

» kr 

* Cambridge-Aaciien 

-Q- anti-l<^ + CA prune 

-A anti-k^ t CA trim 

-& anii-k, + SC trim 




FIG. 9. Dependence of mass peak position on pileup for differ- 
ent algorithms with R — 1.5. The mass distributions at most 
or all pileup levels for the kr, Cambridge- Aachen, and anti-fcr 
algorithms are very broad, with maxima above 100 GeV/c^. 
The anti-fcr mass distribution peaks near 100 GeV/c^ even 
at zero pileup. 



for trimming. Further work would be needed to deter- 
mine whether cross sections can be calculated for the new 
algorithm without large QCD corrections. At the same 
time, as has been observed widely (and wisely), Monte 
Carlo studies may show the feasibility of a method, but 
they are a far cry from optimizing and testing it in a 
genuine experimental context. 
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